Customer Segmentation On An Indian Bank

Shuai Tan

  • Objective of My Report
  • Exlanatory Data Analysis
  • RFM Model
  • K-means Clustering
  • Suggestions

Objective of My Report

Customer Segmentation¶

  • Better understand the current and potential users

    • Investment Behaviors?
    • Demographic Features?
  • Develop marketing strategies

  • Offer relevant deal and advertisement

Exlanatory Data Analysis

TransactionID CustomerID CustomerDOB CustGender CustLocation CustAccountBalance TransactionDate TransactionAmount (INR) CustomerAge
0 T1 C5841053 1994-01-10 0 JAMSHEDPUR 17819.05 2016-08-02 25.0 22
1 T2 C2142763 1957-04-04 1 JHAJJAR 2270.69 2016-08-02 27999.0 59
2 T3 C4417068 1996-11-26 0 MUMBAI 17874.44 2016-08-02 459.0 20
3 T4 C5342380 1973-09-14 0 MUMBAI 866503.21 2016-08-02 2060.0 43
4 T5 C9031234 1988-03-24 0 NAVI MUMBAI 6714.43 2016-08-02 1762.5 28
... ... ... ... ... ... ... ... ... ...
1048562 T1048563 C8020229 1990-04-08 1 NEW DELHI 7635.19 2016-09-18 799.0 26
1048563 T1048564 C6459278 1992-02-20 1 NASHIK 27311.42 2016-09-18 460.0 24
1048564 T1048565 C6412354 1989-05-18 1 HYDERABAD 221757.06 2016-09-18 770.0 27
1048565 T1048566 C6420483 1978-08-30 1 VISAKHAPATNAM 10117.87 2016-09-18 1000.0 38
1048566 T1048567 C8337524 1984-03-05 1 PUNE 75734.42 2016-09-18 1166.0 32

984614 rows × 9 columns

The dataset consists of over 1 million transactions by over 800k customers from a bank in India. It covers nearly three months in 2016, from August 1st to October 21st.

1. Age Distribution

The majority of the customers fall between the ages of 20 to 40, with the peak being at 26.

2. Gender Distribution

The number of male customers is almost three times higher than that of female customers, indicating a dominance of male customers.

3. The Most Frequent 20 Locations

Compared to Mumbai, the sum of transactions in all other regions, excluding the top 5, is significantly lower.

RFM Model

  • Recency: number of days since the last purchase or order.
  • Frequency: average orders during a certain period (for instance, number of monthly purchases).
  • Monetary value: total order amount during a specific time frame.
count mean std min 25% 50% 75% max
Frequency 838561.0 1.174171 0.434989 1.00 1.0 1.0 1.0 6.00
Monetary 838561.0 1706.621763 6689.594162 0.01 199.0 500.0 1420.0 1560034.99
Recency 838561.0 55.407019 15.219939 0.00 43.0 55.0 68.0 81.00

Monetary VS. Recency

For the majority of customers, the monetary values remain low. Most of the bank customers were low-income customers creating bank accounts for depositing money.

Frequency

Text(0, 0.5, 'Counts')

During the three months, it appears that nearly all of bank customers did not engage in regular transactions with their bank for more than once.

Recency

From the recency boxplot, we can have the same conclusion as what we had from frequency.

K-Means

  • K: Number of clusters
  • Means: Average values of features of interest
    • Centroid: Each cluster is represented by its center, centroid.
  • Distance: The distance from a data point to the centroids
    • Euclidean distance: $d = \sqrt{(x_1-x_0)^2+(y_1-y_0)^2}$

Visualization¶

How Many Clusters?

  • Elbow Method: the most straightforward and widely-used way to detect the number of clusters in the K-means model.
    • Inertia: Sum of distances to centroids after completion of K-means Clustering
    • Rule of thumb: Pick the number at the elbow point

Data Preprocess: Standard Scalar

Frequency CustGender CustAccountBalance Monetary CustomerAge Recency
0 2 0 120180.54 5106.0 24 25
1 1 1 24204.49 1499.0 22 68
2 2 0 161848.76 1455.0 24 75
3 1 0 496.18 30.0 26 36
4 1 1 87058.65 5000.0 51 64
... ... ... ... ... ... ...
838556 1 1 133067.23 691.0 26 75
838557 1 1 96063.46 222.0 20 36
838558 1 1 5559.75 126.0 23 64
838559 1 1 35295.92 50.0 21 54
838560 1 1 6968.93 855.0 34 26

838561 rows × 6 columns

After Standardization ...

Frequency CustGender CustAccountBalance Monetary CustomerAge Recency
0 1.898505 -1.615003 0.016841 0.508159 -0.803137 -1.997842
1 -0.400403 0.619194 -0.098354 -0.031037 -1.031624 0.827401
2 1.898505 -1.615003 0.066852 -0.037614 -0.803137 1.287324
3 -0.400403 -1.615003 -0.126810 -0.250631 -0.574651 -1.275106
4 -0.400403 0.619194 -0.022914 0.492314 2.281429 0.564587
... ... ... ... ... ... ...
838556 -0.400403 0.619194 0.032308 -0.151821 -0.574651 1.287324
838557 -0.400403 0.619194 -0.012106 -0.221930 -1.260110 -1.275106
838558 -0.400403 0.619194 -0.120732 -0.236281 -0.917380 0.564587
838559 -0.400403 0.619194 -0.085042 -0.247642 -1.145867 -0.092446
838560 -0.400403 0.619194 -0.119041 -0.127306 0.339295 -1.932139

838561 rows × 6 columns

Elbow Method Plot

Agglomerative Clustering

Also using distance to do clustering
  • How different: The length of the tree branche represents the distance between clusters.
  • How many: We can get as many clusters as the number of data points. It depends on where we draw the partition line.

Agglomerative Clustering Method

label
2 235782
4 224744
1 181287
6 122568
0 70932
5 3115
3 133

From the table, it can be observed that Cluster 3 and Cluster 5 have a relatively small percentage of customers compared to the other clusters.

The majority of our customers are distributed across Clusters 0, 1, 2, 4, and 6, which will be our main focus.¶

  • Clusters 1, 2, and 4 have similar characteristics with a frequency value of 1, and only differ in gender. Therefore, we can group them together for now.
  • Cluster 6 has a higher frequency of transactions with the bank compared to Clusters 1, 2, and 4, but is otherwise similar.
  • Cluster 0 consists of an older and more affluent population.

My Suggetions for The Indian Bank

Based on the fact that nearly all customers in this bank don't have many account balance...¶

Clusters 1, 2, and 4: Since these clusters are similar in terms of recency and diversity, the bank could recommend a credit card with rewards or cash back. This would incentivize customers to use the credit card frequently and potentially increase their account balances over time.

Cluster 6: Given that this group comprises a higher-aged population and is better off, the bank could recommend a savings account with a higher interest rate. This would appeal to customers who are more financially stable and may be looking for a low-risk investment option.

Cluster 0: Since this group tends to have more transactions with the bank, the bank could recommend a checking account with no monthly maintenance fees. This would be an attractive option for customers who frequently use their checking account and want to avoid additional fees. Additionally, the bank could offer overdraft protection to prevent customers from incurring fees for overdrawing their account.

Thank You!